Goto

Collaborating Authors

 lifelong inverse reinforcement learning


Reviews: Lifelong Inverse Reinforcement Learning

Neural Information Processing Systems

Summary: This paper considers the problem of lifelong inverse reinforcement learning, where the goal is to learn a set of reward functions (from demonstrations) that can be applied to a series of tasks. The authors propose to do this by learning and continuously updating a shared latent space of reward components, which are combined with task specific coefficients to reconstruct the reward for a particular task. The derivation of the algorithm basically mirrors the Efficient Lifelong Learning Algorithm (ELLA) (citation [33]). Although ELLA was formulated for supervised learning, variants such as PG-ELLA (not cited in this paper, by Ammar et al. "Online Multi-task Learning for Policy Gradient Methods") have applied the same derivation procedure to extend the original ELLA algorithm to the reinforcement learning setting. This paper is another extension of ELLA, to the inverse reinforcement learning setting, where instead of sharing policies via a latent space, they are sharing reward functions.


Lifelong Inverse Reinforcement Learning

Mendez, Jorge, Shivkumar, Shashank, Eaton, Eric

Neural Information Processing Systems

Methods for learning from demonstration (LfD) have shown success in acquiring behavior policies by imitating a user. However, even for a single task, LfD may require numerous demonstrations. For versatile agents that must learn many tasks via demonstration, this process would substantially burden the user if each task were learned in isolation. To address this challenge, we introduce the novel problem of lifelong learning from demonstration, which allows the agent to continually build upon knowledge learned from previously demonstrated tasks to accelerate the learning of new tasks, reducing the amount of demonstrations required. As one solution to this problem, we propose the first lifelong learning approach to inverse reinforcement learning, which learns consecutive tasks via demonstration, continually transferring knowledge between tasks to improve performance. Papers published at the Neural Information Processing Systems Conference.